Graph Theory Approaches for Optimizing Biomedical Data Analysis Using Reproducible Workflows
نویسندگان
چکیده
ly, computational workflows may be understood as a directed acyclic graph (DAG), a kind of finite graph which contains no cycles and which must be traversed in a specific direction. In this representation, each node is an individual executable command. The edges in the DAG represent execution variables (data elements such as files or parameters) which pass from upstream nodes to downstream ones. Figure 1. Illustration of a directed acyclic graph (DAG). The DAG may be traversed from left-to-right, moving from node-to-node along the edges that connect them. In practice, workflows are described with machine-readable serialized data objects in either a general-purpose programming language (GPL), domain-specific language (DSL), or serialized object models for workflow description. For example, an object model-based approach may describe the steps in a workflow in JSON format with a custom syntax. This workflow description can then be parsed by an engine or executor to create the DAG representation of the workflow. The executor may then translate the directions for workflow execution to actionable jobs in which data is analyzed on a computational infrastructure or backend. 3.1. General structure of a workflow execution There are three general steps in workflow execution: interpretation of a machine-readable workflow description, generation of the workflow DAG, and finally decomposition into individual jobs that can be scheduled for execution. At the beginning of execution, a workflow engine or interpreter is provided with the workflow description and the required inputs for execution of the workflow, such as parameters and file paths (Fig. 2a). The workflow description object is then parsed and a DAG is created (Fig. 2b), which contains the most minimal set of nodes and edges required for computation. In addition to representing the steps in the workflow as a DAG (Fig. 2c), many workflow ontologies model computational jobs as a composite (tree) pattern in which there are “parent nodes” (workflows), which can contain multiple executables or “leaf nodes” (executables) (Fig. 2d). The Rabix engine extends this model by generalizing “parent” nodes to include transformations of the DAG, such as when parallelizations are possible at that node. The engine handles the "execution" or parsing of these parent jobs, while leaves are queued for scheduling and execution on a backend. This model allows for more efficient resolution of DAG features such as . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/074708 doi: bioRxiv preprint first posted online Sep. 12, 2016;
منابع مشابه
AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research
A major challenge of data-driven biomedical research lies in the collection and representation of data provenance information to ensure reproducibility of findings. In order to communicate and reproduce multi-step analysis workflows executed on datasets that contain data for dozens or hundreds of samples, it is crucial to be able to visualize the provenance graph at different levels of aggregat...
متن کاملAnalysis of Resting-State fMRI Topological Graph Theory Properties in Methamphetamine Drug Users Applying Box-Counting Fractal Dimension
Introduction: Graph theoretical analysis of functional Magnetic Resonance Imaging (fMRI) data has provided new measures of mapping human brain in vivo. Of all methods to measure the functional connectivity between regions, Linear Correlation (LC) calculation of activity time series of the brain regions as a linear measure is considered the most ubiquitous one. The strength of the dependence obl...
متن کاملOntology-based Registration of Entities for Data Integration in Large Biomedical Research Projects
Large biomedical projects often include workflows running across institutional borders. In these workflows, data describing biomedical entities, such as patients, bio-materials but also processes itself, is typically produced, modified and analyzed at different locations and by several systems. Therefore, both tracking entities within inter-organizational workflows and data integration are ofte...
متن کاملIntegrating Automated Workflows, Human Intelligence and Collaboration
Many methods and tools have evolved for microarray analysis such as single probe evaluation, promoter module modeling and pathway analysis. Little is known, however, about optimizing this flow of analysis for the flexible reasoning biomedical researchers need for hypothesizing about disease mechanisms. In developing and implementing a workflow, we found that workflows are not complete or valuab...
متن کاملIdentification of mild cognitive impairment disease using brain functional connectivity and graph analysis in fMRI data
Background: Early diagnosis of patients in the early stages of Alzheimer's, known as mild cognitive impairment, is of great importance in the treatment of this disease. If a patient can be diagnosed at this stage, it is possible to treat or delay Alzheimer's disease. Resting-state functional magnetic resonance imaging (fMRI) is very common in the process of diagnosing Alzheimer's disease. In th...
متن کامل